Understanding crime dynamics is essential for effective public safety strategies. In recent years, researchers have increasingly explored how external factors, such as weather conditions, may influence criminal activity. This report examines the role of climatic variables on crime rates in Colchester, Essex, during the year 2024.
This investigation uses two datasets: crime24.csv, which includes monthly street-level crime incidents in Colchester (e.g., crime type, location, outcome), and temp24.csv, which provides daily weather data such as temperature, humidity, pressure, and precipitation from a nearby station.
The analysis begins by independently exploring both crime and climate datasets to identify seasonal, temporal, and geographic patterns. These datasets are then merged to examine how fluctuations in temperature, humidity, or pressure align with changes in crime volume or type. A variety of visualisation techniques, including time series, scatter plots, and spatial mapping, are applied to highlight key patterns and associations. The ultimate goal is to derive practical, actionable recommendations for local law enforcement based on data-informed insights.
Are crime levels influenced by weather in Colchester?
To answer this, we’ll perform: - Exploratory data analysis and visualisation of both datasets - Time series analysis and smoothing - Mapping and spatial insight using Leaflet - Correlation and clustering techniques to explore links between climate and crime
Through this investigation, we aim to uncover trends that may assist in strategic policing, crime prevention, and public awareness campaigns tailored to seasonal or climatic patterns.
### Load required packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(readr)
library(forcats)
library(leaflet)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyr)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(patchwork)
library(viridis)
## Loading required package: viridisLite
library(gganimate)
# Load the crime dataset from CSV file
crime24 <- read_csv("crime24.csv", show_col_types = FALSE)
## New names:
## • `` -> `...1`
# Display the first few rows of the dataset
as.data.frame(head(crime24))
## ...1 category persistent_id date lat long street_id
## 1 1 anti-social-behaviour <NA> 2024-01 51.89301 0.901028 2153130
## 2 2 anti-social-behaviour <NA> 2024-01 51.88979 0.898830 2153105
## 3 3 anti-social-behaviour <NA> 2024-01 51.89825 0.902107 2153147
## 4 4 anti-social-behaviour <NA> 2024-01 51.87837 0.888373 2152856
## 5 5 anti-social-behaviour <NA> 2024-01 51.87905 0.889521 2152871
## 6 6 anti-social-behaviour <NA> 2024-01 51.88860 0.899203 2153107
## street_name context id location_type
## 1 On or near Middle Mill NA 115967607 Force
## 2 On or near Conference/exhibition Centre NA 115967129 Force
## 3 On or near Mason Road NA 115967591 Force
## 4 On or near Kensington Road NA 115967062 Force
## 5 On or near Lambeth Road NA 115967058 Force
## 6 On or near Trinity Street NA 115967547 Force
## location_subtype outcome_status
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
# Load the temperature dataset from CSV files
temp24 <- read_csv("temp24.csv", show_col_types = FALSE)
# Display the first few rows of the dataset
as.data.frame(head(temp24))
## station_ID Date TemperatureCAvg TemperatureCMax TemperatureCMin TdAvgC
## 1 3590 2024-12-31 6.5 7.7 5.0 4.4
## 2 3590 2024-12-30 5.6 6.9 3.4 4.9
## 3 3590 2024-12-29 3.3 4.9 2.2 3.2
## 4 3590 2024-12-28 4.0 5.8 2.3 3.7
## 5 3590 2024-12-27 5.3 6.7 4.3 5.1
## 6 3590 2024-12-26 6.7 10.0 5.6 6.4
## HrAvg WindkmhDir WindkmhInt WindkmhGust PresslevHp Precmm TotClOct lowClOct
## 1 86.4 WSW 22.7 42.6 1025.3 0.0 4.5 7.2
## 2 94.9 WSW 16.7 40.8 1028.5 0.0 8.0 8.0
## 3 98.6 W 11.4 22.2 1028.5 0.4 8.0 8.0
## 4 98.4 SW 5.5 14.8 1031.8 0.4 8.0 8.0
## 5 98.4 S 6.3 16.7 1034.7 0.4 8.0 8.0
## 6 98.3 WSW 9.3 22.2 1033.6 0.4 8.0 8.0
## SunD1h VisKm SnowDepcm PreselevHp
## 1 5.7 63.4 NA NA
## 2 0.0 15.3 NA NA
## 3 0.0 0.5 NA NA
## 4 0.0 0.1 NA NA
## 5 0.0 0.5 NA NA
## 6 0.0 0.2 NA NA
# Fix incomplete date format by adding a day to form valid dates (e.g., "2024-01" -> "01-2024-01")
crime24$date <- paste0("01-", crime24$date)
crime24$date <- as.Date(crime24$date, format = "%d-%Y-%m")
# Extract full month name from the date
crime24$month <- month(crime24$date, label = TRUE, abbr = FALSE)
# Assign a season based on the month number
crime24$season <- case_when(
month(crime24$date) %in% c(12, 1, 2) ~ "Winter", # December–February
month(crime24$date) %in% c(3, 4, 5) ~ "Spring", # March–May
month(crime24$date) %in% c(6, 7, 8) ~ "Summer", # June–August
month(crime24$date) %in% c(9, 10, 11) ~ "Autumn" # September–November
)
# Remove unused columns with no analytical value
crime24 <- crime24 %>% select(-c(context, location_subtype))
# Replace missing values in outcome_status with "Unknown"
crime24$outcome_status <- ifelse(is.na(crime24$outcome_status), "Unknown", crime24$outcome_status)
# Display a count of missing values by column
colSums(is.na(crime24))
## ...1 category persistent_id date lat
## 0 0 732 0 0
## long street_id street_name id location_type
## 0 0 0 0 0
## outcome_status month season
## 0 0 0
# Convert the 'Date' column to proper Date format
temp24$Date <- as.Date(temp24$Date)
# Extract the full month name from the date
temp24$month <- month(temp24$Date, label = TRUE, abbr = FALSE)
# Assign a season based on the month number
temp24$season <- case_when(
month(temp24$Date) %in% c(12, 1, 2) ~ "Winter", # December–February
month(temp24$Date) %in% c(3, 4, 5) ~ "Spring", # March–May
month(temp24$Date) %in% c(6, 7, 8) ~ "Summer", # June–August
month(temp24$Date) %in% c(9, 10, 11) ~ "Autumn" # September–November
)
# Remove unnecessary columns with excessive missing values
temp24 <- temp24 %>% select(-c(PreselevHp, SnowDepcm))
# Replace missing values in rainfall and low cloud cover columns with 0
temp24$Precmm[is.na(temp24$Precmm)] <- 0
temp24$lowClOct[is.na(temp24$lowClOct)] <- 0
# Display a count of remaining missing values by column
colSums(is.na(temp24))
## station_ID Date TemperatureCAvg TemperatureCMax TemperatureCMin
## 0 0 0 0 0
## TdAvgC HrAvg WindkmhDir WindkmhInt WindkmhGust
## 0 0 0 0 0
## PresslevHp Precmm TotClOct lowClOct SunD1h
## 0 0 0 0 0
## VisKm month season
## 0 0 0
# Create a bar plot showing the distribution of crime categories, ordered by frequency
crime_category_plot <- ggplot(crime24, aes(x = fct_infreq(category), fill = category)) +
geom_bar(color = "black") + # Add black borders to bars
labs(title = "Distribution of Crime Categories",
x = "Crime Category", y = "Frequency") +
theme_minimal() + # Apply a clean, minimal theme
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for readability
# Convert the static ggplot to an interactive plotly version
ggplotly(crime_category_plot)
The interactive bar plot shows that violent crime is the most frequently reported offence in Colchester, followed by anti-social behaviour. Other crimes like shoplifting and criminal damage occur moderately, while robbery and possession of weapons are less common. Overall, violence and anti-social behaviour dominate the crime profile in 2024.
# Create a bar plot showing the distribution of crime outcome statuses, ordered by frequency
outcome_plot <- ggplot(crime24, aes(x = fct_infreq(outcome_status), fill = outcome_status)) +
geom_bar(color = "black") + # Add black outline around bars
labs(title = "Outcome Status of Reported Crimes",
x = "Outcome Status", y = "Count") +
theme_minimal() + # Use a clean minimal theme
theme(
axis.text.x = element_text(angle = 70, hjust = 1), # Tilt x-axis labels for better readability
axis.title = element_text(size = 12), # Set font size for axis titles
plot.title = element_text(size = 16, face = "bold") # Make plot title larger and bold
) +
scale_fill_manual(values = rainbow(length(unique(crime24$outcome_status)))) + # Use a distinct color for each outcome
guides(fill = guide_legend(title = "Outcome Status")) # Add a legend title
# Convert the static plot to an interactive plotly version
ggplotly(outcome_plot)
The bar plot reveals that most reported crimes result in no suspect being identified, followed by cases where suspects cannot be prosecuted. A notable portion of cases remains under investigation or await court outcomes, while fewer end in formal charges or other resolutions. This suggests that many crimes in Colchester do not lead to prosecution, highlighting challenges in investigation and legal follow-through.
# Create a bar chart showing the number of crimes by location type
location_plot <- ggplot(crime24, aes(x = location_type, fill = location_type)) +
geom_bar(color = "black") + # Add black borders to each bar
labs(title = "Crimes by Location Type",
x = "Location Type", y = "Number of Crimes") +
theme_minimal() # Apply a clean, minimal visual theme
# Convert the ggplot chart into an interactive Plotly version
ggplotly(location_plot)
The bar chart highlights that nearly all crimes are reported under the local police authority (“Force”), with very few linked to the British Transport Police (“BTP”). This suggests that most criminal incidents occur in community settings rather than transport environments, emphasizing the primary role of local policing in managing crime.
# Identify the top 10 streets with the highest number of crimes
top_streets <- crime24 %>%
group_by(street_name) %>%
summarise(crime_count = n()) %>%
arrange(desc(crime_count)) %>%
slice_head(n = 10)
# Filter the original crime dataset to include only those top 10 streets
crime_top <- crime24 %>% filter(street_name %in% top_streets$street_name)
# Create a stacked bar plot showing crime type distribution across the top 10 streets
hotspot_plot <- ggplot(crime_top, aes(x = street_name, fill = category)) +
geom_bar() + # Plot counts by street and crime category
labs(title = "Crime Type Distribution in Top 10 Streets",
x = "Street", y = "Number of Crimes") +
scale_fill_viridis_d(option = "plasma") + # Apply a visually appealing color palette
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Tilt x-axis labels for readability
# Make the chart interactive with Plotly
ggplotly(hotspot_plot)
The bar plot shows that “On or near Supermarket” and “On or near Shopping Area” are the top crime hotspots, largely driven by shoplifting. In contrast, locations like nightclubs and police stations show a more diverse mix of crimes, including violent offences and public disorder. This suggests that crime patterns vary by location, with retail areas experiencing targeted offences and other zones showing broader criminal activity.
# Create an interactive leaflet map to display crime locations across Colchester
leaflet(crime24) %>%
addTiles() %>% # Add default OpenStreetMap tile layer as the base map
addCircleMarkers(
lng = ~long, lat = ~lat, # Use longitude and latitude for marker placement
radius = 3, # Set marker size
color = "blue", # Use blue for marker color
stroke = FALSE, # Remove border around circles
fillOpacity = 0.6 # Set transparency for better visibility
) %>%
setView( # Center the map view on the average location of all crimes
lng = mean(crime24$long, na.rm = TRUE),
lat = mean(crime24$lat, na.rm = TRUE),
zoom = 13 # Set zoom level for city-scale view
)
The interactive leaflet map reveals that central Colchester, particularly around major roads and commercial zones, is the main crime hotspot. Areas like Cymbeline Way and the town centre show the highest concentration of incidents, while outer residential areas see far fewer. This indicates that crime is closely tied to high-traffic, public-facing spaces.
# Ensure the 'date' column is in proper Date format
crime24$date <- as.Date(crime24$date)
# Create a new variable representing month and year (e.g., "2024-03")
crime24$month_year <- format(crime24$date, "%Y-%m")
# Group the dataset by month and count the number of crimes in each
monthly_crimes <- crime24 %>%
group_by(month_year) %>%
summarise(total_crimes = n()) %>%
mutate(month_year = as.Date(paste0(month_year, "-01"))) # Convert to Date for plotting
# Generate a line plot showing monthly crime trends, with a loess smoothing line
monthly_trend_plot <- ggplot(monthly_crimes, aes(x = month_year, y = total_crimes)) +
geom_line(color = "steelblue", size = 1) + # Line for monthly crime totals
geom_smooth(method = "loess", se = FALSE, color = "darkred", linetype = "dashed") + # Smoothed trend line
labs(title = "Monthly Crime Trend in Colchester (2024)",
x = "Month", y = "Number of Crimes") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Convert to an interactive Plotly plot
ggplotly(monthly_trend_plot)
## `geom_smooth()` using formula = 'y ~ x'
The line chart shows that crime in Colchester peaked mid-year, with the highest levels in July, and declined toward the end of 2024. Early months like January and February also saw elevated activity, while April marked a low point. The overall trend suggests a seasonal pattern, with increased crime during warmer months, possibly due to higher public activity.
# Group crime data by season and count total crimes in each
seasonal_crimes <- crime24 %>%
group_by(season) %>%
summarise(total_crimes = n()) %>%
arrange(desc(total_crimes)) # Order seasons by crime count (descending)
# Create a bar plot showing the number of crimes in each season
season_plot <- ggplot(seasonal_crimes, aes(x = season, y = total_crimes, fill = season)) +
geom_bar(stat = "identity", color = "black") + # Use black borders on bars
scale_fill_manual(values = c("Winter" = "#4682B4", "Spring" = "#90EE90",
"Summer" = "#FFD700", "Autumn" = "#FF8C00")) + # Assign custom colors
labs(title = "Crime Count Across Seasons (2024)",
x = "Season", y = "Number of Crimes") +
theme_minimal() # Apply a clean visual style
# Convert to an interactive plot using Plotly
ggplotly(season_plot)
The bar chart shows that Summer had the highest crime levels in Colchester during 2024, followed by Autumn, while Spring recorded the fewest crimes. This suggests that crime slightly increases during warmer months, likely due to greater public activity, and decreases in colder seasons. Overall, there is a modest seasonal pattern in crime rates.
# Count the number of crimes grouped by both season and category
crime_season_heatmap <- crime24 %>%
group_by(season, category) %>%
summarise(crime_count = n(), .groups = 'drop') # Drop grouping after summarising
# Create a heatmap to show the distribution of crime types across seasons
heatmap_plot <- ggplot(crime_season_heatmap, aes(x = season, y = category, fill = crime_count)) +
geom_tile(color = "white") + # Draw rectangles with white borders
scale_fill_gradient(low = "#ffffcc", high = "#006837", name = "Crime Count") + # Color gradient
labs(title = "Crime Types by Season in Colchester (2024)",
x = "Season", y = "Crime Category") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1), # Rotate x-axis labels
plot.title = element_text(size = 16, face = "bold"), # Style the title
legend.position = "right" # Position the legend
)
# Convert the heatmap to an interactive version
ggplotly(heatmap_plot)
The heatmap shows that violent crime is consistently the most reported offence across all seasons in Colchester, with slight peaks in summer and winter. Other common crimes like anti-social behaviour and shoplifting occur steadily year-round, while less frequent crimes show little seasonal change. Overall, crime patterns remain fairly stable throughout the year, with only modest seasonal variation.
# Reshape temperature data into a tidy (long) format for plotting
temp_long <- temp24 %>%
select(Date, TemperatureCAvg, TemperatureCMax, TemperatureCMin) %>%
pivot_longer(
cols = c(TemperatureCAvg, TemperatureCMax, TemperatureCMin), # Select columns to reshape
names_to = "Metric", # New column for metric names (Avg, Max, Min)
values_to = "Temperature" # New column for temperature values
)
# Create a time series line plot of average, max, and min daily temperatures
temp_trend_plot <- ggplot(temp_long, aes(x = Date, y = Temperature, color = Metric)) +
geom_line(size = 0.7) + # Use thin lines for clarity
labs(title = "Temperature Trends in Colchester (2024)",
x = "Date", y = "Temperature (°C)", color = "Type") +
scale_color_manual(values = c( # Assign specific colors for each metric
"TemperatureCAvg" = "blue",
"TemperatureCMax" = "red",
"TemperatureCMin" = "green"
)) +
theme_minimal() # Apply a clean visual theme
# Convert the plot to an interactive version with Plotly
ggplotly(temp_trend_plot)
The line chart shows a clear seasonal temperature cycle in Colchester during 2024. Temperatures rise from winter to summer, peaking in July and August, then gradually decline toward the year’s end. Maximum, average, and minimum temperatures follow consistent patterns, reflecting typical UK climate trends. This seasonal variation provides useful context for understanding how weather may influence crime activity.
# Create a violin plot to show the distribution of average daily temperatures
temp_violin_plot <- ggplot(temp24, aes(y = TemperatureCAvg, x = "", fill = "")) +
geom_violin(fill = "#87CEFA", color = "black") + # Sky blue fill with black outline
labs(
title = "Distribution of Average Daily Temperature",
y = "Average Temperature (°C)",
x = ""
) +
theme_minimal() # Apply a clean, minimal visual theme
# Convert the static violin plot to an interactive version using Plotly
ggplotly(temp_violin_plot)
The violin plot shows that most average daily temperatures in Colchester during 2024 fell between 7°C and 18°C, with the highest concentration around 12–14°C. Extreme temperatures were rare, with few days below 5°C or above 20°C. Overall, this indicates a mild and stable climate throughout the year.
# Calculate monthly average, maximum, and minimum temperatures
monthly_temp <- temp24 %>%
mutate(month = month(Date, label = TRUE, abbr = FALSE)) %>%
group_by(month) %>%
summarise(
AvgTemp = mean(TemperatureCAvg, na.rm = TRUE),
MaxTemp = mean(TemperatureCMax, na.rm = TRUE),
MinTemp = mean(TemperatureCMin, na.rm = TRUE)
)
# Create a bar chart to show monthly average temperatures with error bars for range
monthly_temp_summary_plot <- ggplot(monthly_temp, aes(x = month)) +
geom_bar(aes(y = AvgTemp), stat = "identity", fill = "#FFA07A") + # Bar for average temperature
geom_errorbar(aes(ymin = MinTemp, ymax = MaxTemp), width = 0.2) + # Error bar for temperature range
labs(
title = "Monthly Average Temperature Range",
x = "Month", y = "Temperature (°C)"
) +
theme_minimal() # Use a clean, minimal theme
# Convert the static plot to an interactive Plotly version
ggplotly(monthly_temp_summary_plot)
The bar chart with error bars shows a clear seasonal temperature pattern in Colchester during 2024. Temperatures rise from January, peak in July and August, then decline into winter. Summer months show both higher averages and wider temperature ranges, while winter months are cooler and more stable. Overall, the plot highlights typical seasonal shifts and daily variability in temperature.
# Create a boxplot to compare average daily temperatures across seasons
seasonal_temp_boxplot <- ggplot(temp24, aes(x = season, y = TemperatureCAvg, fill = season)) +
geom_boxplot() + # Display median, quartiles, and outliers
scale_fill_manual(values = c(
"Winter" = "#4682B4", # Steel blue
"Spring" = "#90EE90", # Light green
"Summer" = "#FFD700", # Gold
"Autumn" = "#FF8C00" # Dark orange
)) +
labs(
title = "Average Temperature by Season",
x = "Season", y = "Average Temperature (°C)"
) +
theme_minimal() # Apply a clean, minimal style
# Convert the static boxplot to an interactive Plotly version
ggplotly(seasonal_temp_boxplot)
The boxplot highlights a clear seasonal shift in temperatures in Colchester during 2024. Summer is the warmest season, with a median above 17°C, while winter is the coldest, with values near 7°C and some days below freezing. Spring and autumn fall in between, with similar medians around 11–12°C. These patterns align with the expected seasonal temperature cycle.
# Create a density plot to show the distribution of daily precipitation values
precip_density_plot <- ggplot(temp24, aes(x = Precmm)) +
geom_density(fill = "#6495ED", alpha = 0.7, color = "black") + # Fill with cornflower blue and black border
labs(
title = "Density of Daily Precipitation",
x = "Precipitation (mm)", y = "Density"
) +
theme_minimal() # Apply a clean minimal theme
# Convert the density plot to an interactive version using Plotly
ggplotly(precip_density_plot)
The density plot shows that most days in Colchester during 2024 were dry or had very light rain, with a strong peak near 0 mm of precipitation. Heavier rainfall (over 10 mm) was rare, and extreme rain events above 20 mm were even less frequent. Overall, the year was marked by mild and mostly dry weather conditions.
# Reshape the data into long format for humidity and pressure
humidity_pressure <- temp24 %>%
select(Date, HrAvg, PresslevHp) %>%
pivot_longer(
cols = c(HrAvg, PresslevHp), # Columns to reshape
names_to = "Variable", # New column for variable names
values_to = "Value" # New column for values
)
# Create a time series line plot of humidity and pressure over the year
humidity_pressure_ts_plot <- ggplot(humidity_pressure, aes(x = Date, y = Value, color = Variable)) +
geom_line() + # Add lines for each variable
labs(
title = "Humidity and Pressure Over Time",
x = "Date", y = "Value", color = "Metric"
) +
theme_minimal() # Apply a minimal visual theme
# Convert the static plot to an interactive Plotly version
ggplotly(humidity_pressure_ts_plot)
The line chart shows that humidity in Colchester remained fairly steady throughout 2024, mostly between 80–90%, with only slight fluctuations. Atmospheric pressure varied a bit more, ranging from 1000 to 1025 hPa, with some shifts in late winter and spring. Overall, both variables were stable, indicating consistent atmospheric conditions during the year.
# Classify each day as "Wet" if precipitation ≥ 1 mm, otherwise "Dry"
temp24$rain_type <- ifelse(temp24$Precmm >= 1, "Wet", "Dry")
# Create a bar plot comparing the number of wet and dry days
wet_dry_days_plot <- ggplot(temp24, aes(x = rain_type, fill = rain_type)) +
geom_bar(color = "black") + # Add black borders to bars
scale_fill_manual(values = c("Dry" = "#FFD700", "Wet" = "#00BFFF")) + # Gold for dry, blue for wet
labs(
title = "Count of Wet vs Dry Days in 2024",
x = "Day Type", y = "Number of Days"
) +
theme_minimal() # Apply a clean minimal theme
# Convert the static plot to an interactive version using Plotly
ggplotly(wet_dry_days_plot)
The bar chart shows that dry days far outnumbered wet days in Colchester during 2024, with more than twice as many dry days recorded. This supports the earlier findings that the year was mostly dry, with rainfall occurring only occasionally, confirming a mild and dry climate pattern overall.
# Create a density plot showing the distribution of daily visibility in kilometers
visibility_density_plot <- ggplot(temp24, aes(x = VisKm)) +
geom_density(fill = "#6A5ACD", alpha = 0.7, color = "black") + # Purple fill with black outline
labs(
title = "Density of Visibility (km)",
x = "Visibility (km)", y = "Density"
) +
theme_minimal() # Apply a clean minimal theme
# Convert the static density plot to an interactive Plotly chart
ggplotly(visibility_density_plot)
The visibility distribution shows that in 2024, most days in Colchester had good visibility, typically between 30–40 km. Low-visibility days were rare, indicating that foggy or hazy conditions were uncommon. Overall, the city experienced clear and stable atmospheric conditions throughout the year.
# Create a histogram showing the distribution of low cloud cover (in octas)
low_cloud_hist_plot <- ggplot(temp24, aes(x = lowClOct)) +
geom_histogram(bins = 10, fill = "#708090", color = "black") + # Slate gray fill with black borders
labs(
title = "Distribution of Low Cloud Cover (Octas)",
x = "Low Cloud Cover (0–8)", y = "Frequency"
) +
theme_minimal() # Apply a clean, minimal theme
# Convert the static histogram to an interactive Plotly version
ggplotly(low_cloud_hist_plot)
The histogram shows that low cloud cover in Colchester during 2024 was most frequently between 6 and 8 octas, indicating mostly cloudy to overcast conditions on many days. Clearer skies (low cloud cover below 4 octas) were relatively rare. Overall, the city experienced frequent cloudiness throughout the year.
# Create a scatter plot to show the relationship between wind speed and wind gusts
wind_speed_gust_plot <- ggplot(temp24, aes(x = WindkmhInt, y = WindkmhGust)) +
geom_point(alpha = 0.5, color = "#20B2AA") + # Semi-transparent teal points
geom_smooth(method = "lm", se = FALSE, color = "red") + # Add linear trend line in red
labs(
title = "Correlation between Wind Speed and Wind Gust",
x = "Wind Speed (km/h)", y = "Wind Gust (km/h)"
) +
theme_minimal() # Apply a clean, minimal theme
# Convert the scatter plot to an interactive Plotly version
ggplotly(wind_speed_gust_plot)
## `geom_smooth()` using formula = 'y ~ x'
The scatter plot shows a strong positive correlation between wind speed and wind gusts in Colchester during 2024. As wind speed increases, wind gusts also tend to rise, following a fairly linear pattern. This suggests that on windier days, stronger gusts are more likely, which is typical in weather dynamics.
# Prepare frequency data for wind directions
wind_dir_freq <- temp24 %>%
count(WindkmhDir, name = "Count") %>%
rename(Direction = WindkmhDir) # Rename for clarity
# Define a custom color palette using the viridis scale
custom_colors <- viridis(length(unique(wind_dir_freq$Direction)))
# Create a polar bar chart (wind rose) using Plotly
plot_ly(wind_dir_freq,
type = 'barpolar', # Use polar bar chart
r = ~Count, # Radius represents count of days
theta = ~Direction, # Angle represents wind direction
color = ~Direction, # Use direction to define color groups
colors = custom_colors) %>%
layout(
title = "Wind Direction Rose Chart (2024)",
polar = list(
angularaxis = list(
direction = "clockwise", # Rotate clockwise
rotation = 90 # Start from North
),
radialaxis = list(
showticklabels = FALSE, # Hide tick labels for cleaner look
ticks = "" # Remove tick marks
)
),
showlegend = FALSE # Hide legend for simplicity
)
The wind rose chart shows that westerly and southwesterly winds were the most frequent in Colchester during 2024, while eastern and northeastern winds were less common. This pattern reflects the prevailing wind directions typical of the UK’s maritime climate, providing useful context for understanding weather patterns and environmental conditions.
# Format dates for merging: extract "YYYY-MM" from each date
crime24$merge_date <- format(crime24$date, "%Y-%m")
temp24$merge_date <- format(temp24$Date, "%Y-%m")
# Aggregate weather data by month: calculate mean values for each variable
temp_monthly <- temp24 %>%
group_by(merge_date) %>%
summarise(
avg_temp = mean(TemperatureCAvg, na.rm = TRUE),
max_temp = mean(TemperatureCMax, na.rm = TRUE),
min_temp = mean(TemperatureCMin, na.rm = TRUE),
humidity = mean(HrAvg, na.rm = TRUE),
pressure = mean(PresslevHp, na.rm = TRUE),
precipitation = mean(Precmm, na.rm = TRUE)
)
# Aggregate total number of crimes by month
crime_monthly <- crime24 %>%
group_by(merge_date) %>%
summarise(
total_crimes = n()
)
# Merge crime and weather datasets by month
weather_crime <- left_join(crime_monthly, temp_monthly, by = "merge_date")
# View the structure of the merged dataset
as.data.frame(head(weather_crime))
## merge_date total_crimes avg_temp max_temp min_temp humidity pressure
## 1 2024-01 529 4.251613 7.348387 0.7419355 83.11935 1015.652
## 2 2024-02 546 7.682759 11.041379 4.0482759 87.22759 1009.383
## 3 2024-03 502 8.135484 11.441935 4.5258065 83.28710 1005.510
## 4 2024-04 471 9.083333 13.393333 4.5500000 78.56333 1012.100
## 5 2024-05 568 13.396774 18.277419 8.4645161 82.49032 1012.271
## 6 2024-06 490 14.323333 19.670000 7.7733333 73.88667 1014.133
## precipitation
## 1 1.735484
## 2 3.193103
## 3 1.883871
## 4 1.813333
## 5 2.600000
## 6 0.840000
# Remove the merge_date column and compute a correlation matrix for numeric variables
cor_matrix <- cor(weather_crime[, -1], use = "complete.obs")
# Reshape the correlation matrix into long format for plotting
cor_melted <- melt(cor_matrix)
# Round correlation values to two decimal places for labels
cor_melted$label <- round(cor_melted$value, 2)
# Create an interactive correlation heatmap using Plotly
crime_weather_corr_plot <- plot_ly(
data = cor_melted,
x = ~Var1,
y = ~Var2,
z = ~value,
type = "heatmap",
text = ~label, # Display correlation values as text
texttemplate = "%{text}", # Format text appearance
textfont = list(color = "black", size = 12),
hoverinfo = "text", # Show only text on hover
colorscale = "YlGnBu", # Apply Yellow-Green-Blue sequential color scale
zmin = -1, zmax = 1, # Set color scale limits
colorbar = list(title = "Correlation")
) %>%
layout(
title = "Correlation Between Crime & Weather Variables",
xaxis = list(title = "", tickangle = -45),
yaxis = list(title = "")
)
# Render the interactive correlation heatmap
crime_weather_corr_plot
The correlation matrix reveals that crime in Colchester during 2024 is moderately positively correlated with temperature and precipitation, suggesting higher crime rates during warmer and wetter conditions. In contrast, humidity and pressure show weak negative correlations, indicating little to no influence on crime levels. Overall, temperature and rainfall appear to have the strongest links to crime trends.
# Plot 1: Relationship between average temperature and total crimes
p1 <- ggplot(weather_crime, aes(x = avg_temp, y = total_crimes)) +
geom_point(color = "#E67E22", size = 3) + # Orange points
geom_smooth(method = "lm", color = "red", se = FALSE) + # Linear regression line
labs(
title = "Avg Temperature vs Crime",
x = "Average Temperature (°C)",
y = "Total Crimes"
) +
theme_minimal()
# Plot 2: Relationship between average humidity and total crimes
p2 <- ggplot(weather_crime, aes(x = humidity, y = total_crimes)) +
geom_point(color = "#3498DB", size = 3) + # Blue points
geom_smooth(method = "lm", color = "red", se = FALSE) + # Linear regression line
labs(
title = "Humidity vs Crime",
x = "Average Humidity (%)",
y = "Total Crimes"
) +
theme_minimal()
# Plot 3: Relationship between average pressure and total crimes
p3 <- ggplot(weather_crime, aes(x = pressure, y = total_crimes)) +
geom_point(color = "#2ECC71", size = 3) + # Green points
geom_smooth(method = "lm", color = "red", se = FALSE) + # Linear regression line
labs(
title = "Pressure vs Crime",
x = "Average Pressure (hPa)",
y = "Total Crimes"
) +
theme_minimal()
# Convert each ggplot object into interactive Plotly plots
p1_plot <- ggplotly(p1)
## `geom_smooth()` using formula = 'y ~ x'
p2_plot <- ggplotly(p2)
## `geom_smooth()` using formula = 'y ~ x'
p3_plot <- ggplotly(p3)
## `geom_smooth()` using formula = 'y ~ x'
# Combine the three interactive plots into a single row layout
subplot(p1_plot, p2_plot, p3_plot, nrows = 1, margin = 0.05, titleX = TRUE, titleY = TRUE) %>%
layout(
title = list(
text = "Weather Factors and Their Relationship with Crime",
x = 0.5, # Center the main title
xanchor = "center",
y = 0.95
)
)
The scatter plots show that crime in Colchester tends to rise with higher temperatures, suggesting a positive link between warmth and criminal activity. In contrast, humidity and pressure show slight negative correlations with crime, though these relationships are weaker. Overall, temperature appears to be the strongest weather-related factor associated with monthly crime patterns in 2024.
# Categorize each month as "Wet" or "Dry" based on average monthly precipitation
weather_crime$rain_type <- ifelse(weather_crime$precipitation >= 1, "Wet", "Dry")
# Create a boxplot comparing total crime counts between wet and dry months
wet_dry_crime_boxplot <- ggplot(weather_crime, aes(x = rain_type, y = total_crimes, fill = rain_type)) +
geom_boxplot() + # Display distribution, median, and range of crimes per group
scale_fill_manual(values = c("Wet" = "#1F78B4", "Dry" = "#FFD700")) + # Blue for wet, gold for dry
labs(
title = "Crime Comparison: Wet vs Dry Months",
x = "Month Type", y = "Total Crimes"
) +
theme_minimal() # Apply a clean minimal theme
# Convert the static boxplot into an interactive Plotly version
ggplotly(wet_dry_crime_boxplot)
The boxplot compares crime levels between wet and dry months in Colchester for 2024. It shows that wet months tend to have slightly higher crime counts on average and a wider range of variation. This pattern suggests a modest association between rainfall and increased crime, possibly due to more social activity or weather-related disruptions during wetter periods.
# Join crime and weather datasets by merge_date (month), allowing multiple matches
combined <- left_join(crime24, temp24, by = "merge_date", relationship = "many-to-many")
# Categorize temperature into three ranges using quantiles: Low, Medium, High
combined$temp_range <- cut(
combined$TemperatureCAvg,
breaks = quantile(combined$TemperatureCAvg, probs = c(0, 0.33, 0.66, 1), na.rm = TRUE),
labels = c("Low", "Medium", "High"),
include.lowest = TRUE
)
# Create a proportional bar plot showing crime category distribution by temperature range
crime_by_temp_range_plot <- ggplot(combined, aes(x = temp_range, fill = category)) +
geom_bar(position = "fill") + # Fill bars proportionally by category
labs(
title = "Crime Type Distribution by Temperature Range",
x = "Temperature Range", y = "Proportion",
fill = "Crime Category"
) +
scale_fill_viridis_d(option = "magma") + # Apply a perceptually uniform color palette
theme_minimal() # Use a clean visual theme
# Convert the static plot to an interactive Plotly version
ggplotly(crime_by_temp_range_plot)
The chart shows that while most crime types occur consistently across temperature ranges, anti-social behaviour and public order offences are slightly more common in warmer conditions. In contrast, crimes like vehicle crime and shoplifting remain steady regardless of temperature. This suggests that some crime types may be more influenced by temperature than others.
# Categorize humidity into two bins: Low and High based on median split
combined$humidity_range <- cut(
combined$HrAvg,
breaks = quantile(combined$HrAvg, probs = c(0, 0.5, 1), na.rm = TRUE),
labels = c("Low", "High"),
include.lowest = TRUE
)
# Create a violin plot to visualize humidity distribution across crime categories
humidity_crime_violin_plot <- ggplot(combined, aes(x = category, y = HrAvg, fill = humidity_range)) +
geom_violin(scale = "width", trim = FALSE, color = "black") + # Violin plot with uniform width
labs(
title = "Distribution of Humidity by Crime Category",
x = "Crime Category", y = "Humidity (%)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for readability
# Convert the violin plot to an interactive Plotly chart
ggplotly(humidity_crime_violin_plot)
The violin plot shows that most crime types occur under both low and high humidity, but slightly more crimes—especially violent and public order offences—are associated with higher humidity levels. This suggests a minor trend of increased crime during more humid conditions.
# Filter out rows with missing latitude or longitude to ensure valid map points
crime_map_data <- crime24 %>% filter(!is.na(lat), !is.na(long))
# Create an interactive leaflet map with marker clustering for crime incidents
leaflet(crime_map_data) %>%
addTiles() %>% # Add default OpenStreetMap tile layer
addMarkers(
lng = ~long, lat = ~lat, # Use coordinates from dataset
clusterOptions = markerClusterOptions(), # Enable clustering of nearby markers
popup = ~paste("Crime:", category, "<br>", "Street:", street_name) # Display info on click
) %>%
setView(
lng = mean(crime_map_data$long, na.rm = TRUE), # Center the map on the average longitude
lat = mean(crime_map_data$lat, na.rm = TRUE), # Center the map on the average latitude
zoom = 13 # Set zoom level for detailed town view
)
The interactive leaflet map with clustering provides a spatial overview of crime hotspots across Colchester in 2024. Each numbered circle represents a cluster of reported crimes in a specific geographic area, with the number indicating the total incidents in that location. Larger circles and warmer colors (orange/red) reflect areas with higher concentrations of crime.
From the map, it’s clear that the town center—particularly the vicinity around Colchester Town Station and Southway—emerges as a primary hotspot, with cluster counts exceeding 1,100 crimes in some zones. Other significant concentrations appear along Butt Road, Hythe Hill, and near St Nicholas Street, suggesting these may be high-traffic or densely populated areas. In contrast, peripheral areas such as Cymbeline Meadows or Abbey Field show much lower crime frequencies, with some clusters having fewer than 10 incidents.
In summary, this clustered map helps visualize how crime is spatially distributed, highlighting urban centers as key hotspots. It enables authorities and policymakers to target specific areas for surveillance, prevention, or community engagement efforts.
# Prepare data for K-means clustering
crime_k <- crime_map_data %>%
select(lat, long) %>%
na.omit()
# Run K-means clustering (4 clusters here, adjust as needed)
set.seed(123)
crime_kmeans <- kmeans(crime_k, centers = 4)
# Add cluster assignments to your spatial data
crime_map_data$cluster <- factor(crime_kmeans$cluster)
# Define a color palette for clusters
cluster_palette <- colorFactor(
palette = c("red", "blue", "green", "purple"),
domain = crime_map_data$cluster
)
# Create a Leaflet map showing crime clusters from K-means results
leaflet(crime_map_data) %>%
addTiles() %>% # Add default OpenStreetMap tiles
addCircleMarkers(
lng = ~long,
lat = ~lat,
color = ~cluster_palette(cluster), # Color markers by cluster
radius = 5,
stroke = FALSE,
fillOpacity = 0.6,
popup = ~paste("Cluster:", cluster, "<br>Crime:", category, "<br>Street:", street_name)
) %>%
setView(
lng = mean(crime_map_data$long, na.rm = TRUE),
lat = mean(crime_map_data$lat, na.rm = TRUE),
zoom = 14
) %>%
addLegend("bottomright",
pal = cluster_palette,
values = ~cluster,
title = "Crime Clusters",
opacity = 0.8)
The K-Means clustering plot illustrates how crime locations in Colchester are spatially grouped based on geographic coordinates. Using four cluster centers, the algorithm identifies areas where incidents are densely concentrated, assigning each to a specific color-coded cluster.
From the visualization, we can distinguish four prominent spatial clusters. Cluster 1 (red) is concentrated in the southwestern zone, Cluster 2 (blue) covers the eastern region, Cluster 3 (green) appears in the southern-central part, and Cluster 4 (purple) is focused in the northern stretch of the town center. This clear geographic separation suggests that different neighborhoods in Colchester are subject to distinct crime patterns.
These insights are valuable for law enforcement and urban planners. Cluster-specific trends could indicate hotspots for nightlife, commercial activity, or residential vulnerability. As a result, interventions—such as increased patrols, surveillance, or community outreach—can be strategically localized, allowing for more effective and efficient crime prevention efforts across the town.
# Group crime data by month and category, and count the number of incidents
crime_time <- crime24 %>%
group_by(month_year = format(date, "%Y-%m"), category) %>%
summarise(count = n(), .groups = 'drop')
# Create an interactive multi-line chart showing monthly crime trends per category
plot_ly(
data = crime_time,
x = ~month_year, # X-axis: month and year
y = ~count, # Y-axis: number of crimes
color = ~category, # Color lines by crime category
colors = viridis_pal(option = "D")(length(unique(crime_time$category))), # Apply perceptually uniform palette
type = 'scatter', mode = 'lines+markers' # Line plot with markers
) %>%
layout(
title = "Interactive Monthly Crime by Category",
xaxis = list(title = "Month"),
yaxis = list(title = "Crime Count")
)
The interactive chart shows monthly crime trends by category in 2024. Violent crime is the most frequent throughout the year, followed by anti-social behaviour and shoplifting. Some categories remain low and stable, such as bicycle theft and possession of weapons. Overall, the chart highlights seasonal fluctuations and helps identify which crime types dominate each month.
# Prepare monthly crime data by summarising the number of incidents per category each month
crime_anim_data <- crime24 %>%
mutate(month = format(date, "%Y-%m")) %>%
group_by(month, category) %>%
summarise(crime_count = n(), .groups = 'drop')
# Create a base ggplot object for animation
crime_anim_plot <- ggplot(crime_anim_data, aes(x = reorder(category, -crime_count), y = crime_count, fill = category)) +
geom_col(show.legend = FALSE) + # Use column chart without legend
labs(
title = 'Monthly Crime by Category: {closest_state}', # Dynamic title showing current month
x = 'Crime Category',
y = 'Number of Crimes'
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + # Tilt x-axis labels for clarity
transition_states(month, transition_length = 2, state_length = 1) + # Animate across months
ease_aes('cubic-in-out') # Smooth easing for transitions
# Render the animation as a GIF (requires gifski package)
animate(crime_anim_plot, nframes = 100, fps = 10, renderer = gifski_renderer())
The animated bar chart visualizes how crime categories changed month-by-month throughout 2024. It clearly shows violent crime consistently dominating across all months, followed by anti-social behaviour and shoplifting. While the order of less frequent crimes shifts slightly, overall trends remain stable.
This animation provides a dynamic view of seasonal or monthly fluctuations in crime types, making it easier to spot peaks and patterns over time. It’s especially useful for identifying months with spikes in certain offences and tracking how priorities may shift throughout the year.
# Prepare spatial crime data with valid coordinates and a monthly timestamp
crime_map_anim <- crime24 %>%
filter(!is.na(lat) & !is.na(long)) %>%
mutate(month = format(date, "%Y-%m"))
# Identify the top 5 most frequent crime categories
top_categories <- crime_map_anim %>%
count(category, sort = TRUE) %>%
top_n(5) %>%
pull(category)
## Selecting by n
# Filter the dataset to include only the top 5 crime categories
crime_map_anim <- crime_map_anim %>%
filter(category %in% top_categories)
# Define custom color palette for selected crime categories
custom_colors <- c("red", "pink", "blue", "green", "black")
# Load UK map boundaries for base layer
uk_map <- map_data("world", region = "UK")
# Create an animated plot showing monthly changes in crime hotspots
map_anim_plot <- ggplot() +
geom_polygon(data = uk_map, aes(x = long, y = lat, group = group),
fill = "white", color = "gray80") + # Base map with light outline
geom_point(data = crime_map_anim, aes(x = long, y = lat, color = category),
alpha = 0.7, size = 2) + # Plot crime points with transparency
scale_color_manual(values = custom_colors) + # Apply custom colors
coord_fixed(xlim = range(crime24$long), ylim = range(crime24$lat)) + # Lock map aspect ratio to data range
labs(
title = "Monthly Crime Hotspots in Colchester: {closest_state}",
subtitle = "Top 5 Crime Categories",
x = "Longitude", y = "Latitude"
) +
theme_minimal() +
theme(legend.position = "right") +
transition_states(month, transition_length = 2, state_length = 1) + # Animate by month
ease_aes("linear") # Smooth linear transition
# Render the animation as a GIF
animate(map_anim_plot, nframes = 100, fps = 10, renderer = gifski_renderer())
The animated hotspot map showcases how the top five crime categories are spatially distributed across Colchester each month in 2024. Each colored point represents an incident from a leading category such as violent crime, anti-social behaviour, or shoplifting.
From this visualization, we observe that hotspots are consistently concentrated around central Colchester, with activity fluctuating across months. Violent crimes (black) appear frequently and are widely dispersed, while shoplifting (green) and public order offences (blue) are more centralized. The dynamic movement and density of these hotspots reveal temporal patterns that can aid law enforcement in allocating resources and planning patrols more effectively.
# Select relevant columns for animation (daily average temperature)
temp_anim <- temp24 %>%
select(Date, TemperatureCAvg)
# Create a line plot of daily average temperature
temp_plot <- ggplot(temp_anim, aes(x = Date, y = TemperatureCAvg)) +
geom_line(color = "purple", size = 0.7) + # Plot in purple with moderate line thickness
labs(
title = "Daily Average Temperature in Colchester (2024)",
x = "Date", y = "Temperature (°C)"
) +
theme_bw() + # Use classic black-and-white theme
transition_reveal(Date) # Animate reveal over time using the Date variable
# Render the animated plot as a GIF
animate(temp_plot, nframes = 100, fps = 10, renderer = gifski_renderer())
The animated line plot displays the daily average temperature in Colchester throughout 2024. It clearly reflects the expected seasonal pattern—temperatures rise steadily from winter to summer, peaking around July and August, before declining again toward the end of the year.
This smooth fluctuation highlights Colchester’s temperate climate, with noticeable warmth in mid-year and cooler conditions during the early and late months. The animation makes it easy to observe short-term spikes or drops in temperature, possibly linked to brief weather events.
This project examined whether weather conditions—specifically temperature, humidity, precipitation, and pressure—influenced crime patterns in Colchester during 2024. Exploratory analysis of both crime and climate data revealed several notable associations.
Seasonal trends showed that crime peaked during warmer months, particularly summer and early autumn, with offences like anti-social behaviour and public disorder more common during these periods. This suggests that warmer weather may indirectly contribute to higher crime through increased outdoor activity and social interaction. Conversely, colder and wetter months saw slightly reduced crime, consistent with the idea that poor weather discourages public presence and opportunistic behaviour.
Among weather variables, temperature had the strongest association with total crime, while humidity and pressure showed weaker, more category-specific links. Rainfall had minimal effect on overall crime, though certain offences showed mild sensitivity to precipitation.
Spatial analysis identified persistent hotspots around central Colchester, especially near nightlife and commercial areas. Clustering analysis revealed distinct crime zones, offering useful insights for targeted resource deployment. Animated and interactive visualisations further highlighted temporal and spatial shifts in crime.
Based on these findings, several recommendations are proposed. Policing efforts could be adjusted seasonally, with increased patrols during warmer months and around hotspot areas. Integrating weather data into early-warning systems may also support proactive crime prevention. Additionally, spatial clustering results could inform strategic CCTV placement and community engagement initiatives.
Future research could build on this project by incorporating additional environmental factors such as wind speed, cloud cover, and visibility, or by examining crime patterns by time of day. Applying predictive modelling techniques—such as logistic regression or machine learning—may improve forecasting of high-risk periods based on weather trends. Comparative studies across other towns or cities could also reveal whether similar weather-crime relationships hold in different urban contexts.
This study demonstrates that weather conditions have a measurable, though modest, impact on crime patterns in Colchester. Integrating environmental insights into crime analysis can support more informed, responsive policing and long-term public safety planning.
Office for National Statistics (2023). Crime in England and Wales: year ending December 2022. Retrieved from https://www.ons.gov.uk
Essex Police (2024). Colchester Crime Statistics 2024. Available via UK Police Data Portal: https://data.police.uk/
NJ Tierney (2024). ukp.crime – Street-level crime data documentation. Retrieved from https://ukpolice.njtierney.com/reference/ukp_crime.html
B. Czernecki (2024). meteo_ogimet – Climate data from global weather stations. Retrieved from https://bczernecki.github.io/climate/reference/meteo_ogimet.html
Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. SAGE Publications.
Hipp, J.R., & Kim, Y.A. (2017). “Weather and crime: How weather patterns influence violent and property crimes.” Environment and Behavior, 49(2), 143–172.
Mohler, G.O., Short, M.B., Brantingham, P.J., Schoenberg, F.P., & Tita, G.E. (2011). “Self-exciting point process modeling of crime.” Journal of the American Statistical Association, 106(493), 100–108.
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/